16 research outputs found
Near-optimal irrevocable sample selection for periodic data streams with applications to marine robotics
We consider the task of monitoring spatiotemporal phenomena in real-time by
deploying limited sampling resources at locations of interest irrevocably and
without knowledge of future observations. This task can be modeled as an
instance of the classical secretary problem. Although this problem has been
studied extensively in theoretical domains, existing algorithms require that
data arrive in random order to provide performance guarantees. These algorithms
will perform arbitrarily poorly on data streams such as those encountered in
robotics and environmental monitoring domains, which tend to have
spatiotemporal structure. We focus on the problem of selecting representative
samples from phenomena with periodic structure and introduce a novel sample
selection algorithm that recovers a near-optimal sample set according to any
monotone submodular utility function. We evaluate our algorithm on a seven-year
environmental dataset collected at the Martha's Vineyard Coastal Observatory
and show that it selects phytoplankton sample locations that are nearly optimal
in an information-theoretic sense for predicting phytoplankton concentrations
in locations that were not directly sampled. The proposed periodic secretary
algorithm can be used with theoretical performance guarantees in many real-time
sensing and robotics applications for streaming, irrevocable sample selection
from periodic data streams.Comment: 8 pages, accepted for presentation in IEEE Int. Conf. on Robotics and
Automation, ICRA '18, Brisbane, Australia, May 201
Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models
The gap between our ability to collect interesting data and our ability to
analyze these data is growing at an unprecedented rate. Recent algorithmic
attempts to fill this gap have employed unsupervised tools to discover
structure in data. Some of the most successful approaches have used
probabilistic models to uncover latent thematic structure in discrete data.
Despite the success of these models on textual data, they have not generalized
as well to image data, in part because of the spatial and temporal structure
that may exist in an image stream.
We introduce a novel unsupervised machine learning framework that
incorporates the ability of convolutional autoencoders to discover features
from images that directly encode spatial information, within a Bayesian
nonparametric topic model that discovers meaningful latent patterns within
discrete data. By using this hybrid framework, we overcome the fundamental
dependency of traditional topic models on rigidly hand-coded data
representations, while simultaneously encoding spatial dependency in our topics
without adding model complexity. We apply this model to the motivating
application of high-level scene understanding and mission summarization for
exploratory marine robots. Our experiments on a seafloor dataset collected by a
marine robot show that the proposed hybrid framework outperforms current
state-of-the-art approaches on the task of unsupervised seafloor terrain
characterization.Comment: 8 page
Balancing exploration and exploitation: task-targeted exploration for scientific decision-making
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2022.How do we collect observational data that reveal fundamental properties of scientific phenomena? This is a key challenge in modern scientific discovery. Scientific phenomena are complex—they have high-dimensional and continuous state, exhibit chaotic dynamics, and generate noisy sensor observations. Additionally, scientific experimentation often requires significant time, money, and human effort. In the face of these challenges, we propose to leverage autonomous decision-making to augment and accelerate human scientific discovery.
Autonomous decision-making in scientific domains faces an important and classical challenge: balancing exploration and exploitation when making decisions under uncertainty. This thesis argues that efficient decision-making in real-world, scientific domains requires task-targeted exploration—exploration strategies that are tuned to a specific task. By quantifying the change in task performance due to exploratory actions, we enable decision-makers that can contend with highly uncertain real-world environments, performing exploration parsimoniously to improve task performance.
The thesis presents three novel paradigms for task-targeted exploration that are motivated by and applied to real-world scientific problems. We first consider exploration in partially observable Markov decision processes (POMDPs) and present two novel planners that leverage task-driven information measures to balance exploration and exploitation. These planners drive robots in simulation and oceanographic field trials to robustly identify plume sources and track targets with stochastic dynamics. We next consider the exploration- exploitation trade-off in online learning paradigms, a robust alternative to POMDPs when the environment is adversarial or difficult to model. We present novel online learning algorithms that balance exploitative and exploratory plays optimally under real-world constraints, including delayed feedback, partial predictability, and short regret horizons.
We use these algorithms to perform model selection for subseasonal temperature and precipitation forecasting, achieving state-of-the-art forecasting accuracy.
The human scientific endeavor is poised to benefit from our emerging capacity to integrate observational data into the process of model development and validation. Realizing the full potential of these data requires autonomous decision-makers that can contend with the inherent uncertainty of real-world scientific domains. This thesis highlights the critical role that task-targeted exploration plays in efficient scientific decision-making and proposes three novel methods to achieve task-targeted exploration in real-world oceanographic and climate science applications.This material is based upon work supported by the NSF Graduate Research Fellowship Program and a Microsoft Research PhD Fellowship, as well as the Department of Energy / National Nuclear Security Administration under Award Number DE-NA0003921, the Office of Naval Research under Award Number N00014-17-1-2072, and DARPA under Award Number HR001120C0033
Statistical models and decision making for robotic scientific information gathering
Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2018.Mobile robots and autonomous sensors have seen increasing use in scientific applications, from planetary rovers surveying for signs of life on Mars, to environmental buoys measuring and logging oceanographic conditions in coastal regions. This thesis
makes contributions in both planning algorithms and model design for autonomous scientific information gathering, demonstrating how theory from machine learning, decision theory, theory of optimal experimental design, and statistical inference can be used to develop online algorithms for robotic information gathering that are robust to modeling errors, account for spatiotemporal structure in scientific data, and have probabilistic performance guarantees.
This thesis first introduces a novel sample selection algorithm for online, irrevocable sampling in data streams that have spatiotemporal structure, such as those that commonly arise in robotics and environmental monitoring. Given a limited sampling
capacity, the proposed periodic secretary algorithm uses an information-theoretic reward function to select samples in real-time that maximally reduce posterior uncertainty in a given scientific model. Additionally, we provide a lower bound on the quality of samples selected by the periodic secretary algorithm by leveraging the submodularity of the information-theoretic reward function. Finally, we demonstrate the robustness of the proposed approach by employing the periodic secretary algorithm to select samples irrevocably from a seven-year oceanographic data stream collected at the Martha’s Vineyard Coastal Observatory off the coast of Cape Cod, USA.
Secondly, we consider how scientific models can be specified in environments – such as the deep sea or deep space – where domain scientists may not have enough a priori knowledge to formulate a formal scientific model and hypothesis. These domains require scientific models that start with very little prior information and construct a model of the environment online as observations are gathered. We propose unsupervised machine learning as a technique for science model-learning in these environments. To this end, we introduce a hybrid Bayesian-deep learning model that learns a nonparametric topic model of a visual environment. We use this semantic visual model to identify observations that are poorly explained in the current model,
and show experimentally that these highly perplexing observations often correspond to scientifically interesting phenomena. On a marine dataset collected by the SeaBED AUV on the Hannibal Sea Mount, images of high perplexity in the learned model corresponded, for example, to a scientifically novel crab congregation in the deep sea.
The approaches presented in this thesis capture the depth and breadth of the problems facing the field of autonomous science. Developing robust autonomous systems that enhance our ability to perform exploratory science in environments such as the oceans, deep space, agricultural and disaster-relief zones will require insight and techniques from classical areas of robotics, such as motion and path planning, mapping, and localization, and from other domains, including machine learning, spatial statistics, optimization, and theory of experimental design. This thesis demonstrates how theory and practice from these diverse disciplines can be unified to address problems in autonomous scientific information gathering
Statistical models and decision making for robotic scientific information gathering
Thesis: S.M., Joint Program in Applied Ocean Physics and Engineering (Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science; and the Woods Hole Oceanographic Institution), 2018.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 97-107).Mobile robots and autonomous sensors have seen increasing use in scientific applications, from planetary rovers surveying for signs of life on Mars, to environmental buoys measuring and logging oceanographic conditions in coastal regions. This thesis makes contributions in both planning algorithms and model design for autonomous scientific information gathering, demonstrating how theory from machine learning, decision theory, theory of optimal experimental design, and statistical inference can be used to develop online algorithms for robotic information gathering that are robust to modeling errors, account for spatiotemporal structure in scientific data, and have probabilistic performance guarantees. This thesis first introduces a novel sample selection algorithm for online, irrevocable sampling in data streams that have spatiotemporal structure, such as those that commonly arise in robotics and environmental monitoring. Given a limited sampling capacity, the proposed periodic secretary algorithm uses an information-theoretic reward function to select samples in real-time that maximally reduce posterior uncertainty in a given scientific model. Additionally, we provide a lower bound on the quality of samples selected by the periodic secretary algorithm by leveraging the submodularity of the information-theoretic reward function. Finally, we demonstrate the robustness of the proposed approach by employing the periodic secretary algorithm to select samples irrevocably from a seven-year oceanographic data stream collected at the Martha's Vineyard Coastal Observatory off the coast of Cape Cod, USA. Secondly, we consider how scientific models can be specified in environments - such as the deep sea or deep space - where domain scientists may not have enough a priori knowledge to formulate a formal scientific model and hypothesis. These domains require scientific models that start with very little prior information and construct a model of the environment online as observations are gathered. We propose unsupervised machine learning as a technique for science model-learning in these environments. To this end, we introduce a hybrid Bayesian-deep learning model that learns a nonparametric topic model of a visual environment. We use this semantic visual model to identify observations that are poorly explained in the current model, and show experimentally that these highly perplexing observations often correspond to scientifically interesting phenomena. On a marine dataset collected by the SeaBED AUV on the Hannibal Sea Mount, images of high perplexity in the learned model corresponded, for example, to a scientifically novel crab congregation in the deep sea. The approaches presented in this thesis capture the depth and breadth of the problems facing the field of autonomous science. Developing robust autonomous systems that enhance our ability to perform exploratory science in environments such as the oceans, deep space, agricultural and disaster-relief zones will require insight and techniques from classical areas of robotics, such as motion and path planning, mapping, and localization, and from other domains, including machine learning, spatial statistics, optimization, and theory of experimental design. This thesis demonstrates how theory and practice from these diverse disciplines can be unified to address problems in autonomous scientific information gathering.by Genevieve Elaine Flaspohler.S.M
Adaptive Bias Correction for Improved Subseasonal Forecasting
Subseasonal forecasting \unicode{x2013} predicting temperature and
precipitation 2 to 6 weeks \unicode{x2013} ahead is critical for effective
water allocation, wildfire management, and drought and flood mitigation. Recent
international research efforts have advanced the subseasonal capabilities of
operational dynamical models, yet temperature and precipitation prediction
skills remains poor, partly due to stubborn errors in representing atmospheric
dynamics and physics inside dynamical models. To counter these errors, we
introduce an adaptive bias correction (ABC) method that combines
state-of-the-art dynamical forecasts with observations using machine learning.
When applied to the leading subseasonal model from the European Centre for
Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting
skill by 60-90% and precipitation forecasting skill by 40-69% in the contiguous
U.S. We couple these performance improvements with a practical workflow, based
on Cohort Shapley, for explaining ABC skill gains and identifying higher-skill
windows of opportunity based on specific climate conditions.Comment: 16 pages of main paper and 2 pages of appendix tex
Streaming Scene Maps for Co-Robotic Exploration in Bandwidth Limited Environments
This paper proposes a bandwidth tunable technique for real-time probabilistic
scene modeling and mapping to enable co-robotic exploration in communication
constrained environments such as the deep sea. The parameters of the system
enable the user to characterize the scene complexity represented by the map,
which in turn determines the bandwidth requirements. The approach is
demonstrated using an underwater robot that learns an unsupervised scene model
of the environment and then uses this scene model to communicate the spatial
distribution of various high-level semantic scene constructs to a human
operator. Preliminary experiments in an artificially constructed tank
environment as well as simulated missions over a 10m10m coral reef
using real data show the tunability of the maps to different bandwidth
constraints and science interests. To our knowledge this is the first paper to
quantify how the free parameters of the unsupervised scene model impact both
the scientific utility of and bandwidth required to communicate the resulting
scene model.Comment: 8 pages, 6 figures, accepted for presentation in IEEE Int. Conf. on
Robotics and Automation, ICRA '19, Montreal, Canada, May 201
Online Learning with Optimism and Delay
Inspired by the demands of real-time climate and weather forecasting, we
develop optimistic online learning algorithms that require no parameter tuning
and have optimal regret guarantees under delayed feedback. Our algorithms --
DORM, DORM+, and AdaHedgeD -- arise from a novel reduction of delayed online
learning to optimistic online learning that reveals how optimistic hints can
mitigate the regret penalty caused by delay. We pair this delay-as-optimism
perspective with a new analysis of optimistic learning that exposes its
robustness to hinting errors and a new meta-algorithm for learning effective
hinting strategies in the presence of delay. We conclude by benchmarking our
algorithms on four subseasonal climate forecasting tasks, demonstrating low
regret relative to state-of-the-art forecasting models.Comment: ICML 2021. 9 pages of main paper and 26 pages of appendix tex
Quantifying the swimming gaits of veined squid (Loligo forbesi) using bio-logging tags
Author Posting. © Company of Biologists, 2019. This article is posted here by permission of Company of Biologists for personal use, not for redistribution. The definitive version was published in Journal of Experimental Biology 222 (2019):jeb.198226, doi: 10.1242/jeb.198226.Squid are mobile, diverse, ecologically important marine organisms whose behavior and habitat use can have substantial impacts on ecosystems and fisheries. However, as a consequence in part of the inherent challenges of monitoring squid in their natural marine environment, fine-scale behavioral observations of these free-swimming, soft-bodied animals are rare. Bio-logging tags provide an emerging way to remotely study squid behavior in their natural environments. Here, we applied a novel, high-resolution bio-logging tag (ITAG) to seven veined squid, Loligo forbesii, in a controlled experimental environment to quantify their short-term (24 h) behavioral patterns. Tag accelerometer, magnetometer and pressure data were used to develop automated gait classification algorithms based on overall dynamic body acceleration, and a subset of the events were assessed and confirmed using concurrently collected video data. Finning, flapping and jetting gaits were observed, with the low-acceleration finning gaits detected most often. The animals routinely used a finning gait to ascend (climb) and then glide during descent with fins extended in the tank's water column, a possible strategy to improve swimming efficiency for these negatively buoyant animals. Arms- and mantle-first directional swimming were observed in approximately equal proportions, and the squid were slightly but significantly more active at night. These tag-based observations are novel for squid and indicate a more efficient mode of movement than suggested by some previous observations. The combination of sensing, classification and estimation developed and applied here will enable the quantification of squid activity patterns in the wild to provide new biological information, such as in situ identification of behavioral states, temporal patterns, habitat requirements, energy expenditure and interactions of squid through space–time in the wild.This work was supported by Woods Hole Oceanographic Institution’s Ocean Life Institute and the Innovative Technology Program, Hopkins Marine Station’s Marine Life Observatory (to K.K.), as well as the National Science Foundation Program for Instrument Development for Biological Research (award no. 1455593 to T.A.M., K.K. and K.A.S.). F.C. thanks the PresidentĂs International Fellowship Initiative (PIFI) of the Chinese Academy of Science. G.E.F. thanks the National Science Foundation GRFP and National Science Foundation REU programs for support of this research.2020-10-2
Near-optimal irrevocable sample selection for periodic data streams with applications to marine robotics
We consider the task of monitoring spatiotemporal phenomena in real-time by deploying limited sampling resources at locations of interest irrevocably and without knowledge of future observations. This task can be modeled as an instance of the classical secretary problem. Although this problem has been studied extensively in theoretical domains, existing algorithms require that data arrive in random order to provide performance guarantees. These algorithms will perform arbitrarily poorly on data streams such as those encountered in robotics and environmental monitoring domains, which tend to have spatiotemporal structure. We focus on the problem of selecting representative samples from phenomena with periodic structure and introduce a novel sample selection algorithm that recovers a near-optimal sample set according to any monotone submodular utility function. We evaluate our algorithm on a seven-year environmental dataset collected at the Martha’s Vineyard Coastal Observatory and show that it selects phytoplankton sample locations that are nearly optimal in an information-theoretic sense for predicting phytoplankton concentrations in locations that were not directly sampled. The proposed periodic secretary algorithm can be used with theoretical performance guarantees in many real-time sensing and robotics applications for streaming, irrevocable sample selection from periodic data streams